Automatic Identification of Support Verbs: A Step Towards a Definition of Semantic Weight
نویسنده
چکیده
Current measures of the readability of texts are very simplistic, typically based on counts of words or syllables per sentence. A more sophisticated analysis needs to take account of the fact that the particular distributions of meanings across wordings chosen by the writer, and the consequent variations in syntactic structure, have a significant effect on readability. A step towards the required sophistication is provided by the notion of lexical density (Halliday, 1985), which suggests that different words carry different amounts of semantic weight; this idea of semantic weight is also used implicitly in areas such as information retrieval and authorship attribution. Current definitions of these notions of lexical density and semantic weight are based on the division of words into closed and open classes, and on intuition. This paper develops a computationally tractable definition of semantic weight, concentrating on what it means for a word to be semantically light; the definition involves looking at the frequency of a word in particular syntactic constructions which are indicative of lightness. Verbs such as make and take, when they function as support verbs, are often considered to be semantically light. To test our definition, we carried out an experiment based on that of Grefenstette and Teufel (1995), where we automatically identify light instances of these words in a corpus; this was done by incorporating our frequency-related definition of semantic weight into a statistical approach similar to that of Grefenstette and Teufel. The results show that this is a plausible definition of semantic lightness for verbs, which can possibly be extended to defining semantic lightness for other classes of words. ∗Reprinted with kind permission from: ”Automatic Identification of Support Verbs: A Step Towards a Definition of Semantic Weight” in Proceedings of the Eighth Australian Joint Conference on Artificial Intelligence (World Scientific, Singapore, 1995) pp 451 458. Copyright by World Scientific Publishing Co. Pte, 1995.
منابع مشابه
Automatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کاملLexical Semantics and Selection of TAM in Bantu Languages: A Case of Semantic Classification of Kiswahili Verbs
The existing literature on Bantu verbal semantics demonstrated that inherent semantic content of verbs pairs directly with the selection of tense, aspect and modality formatives in Bantu languages like Chasu, Lucazi, Lusamia, and Shiyeyi. Thus, the gist of this paper is the articulation of semantic classification of verbs in Kiswahili based on the selection of TAM types. This is because the sem...
متن کاملIdentifying Pronominal Verbs: Towards Automatic Disambiguation of the Clitic 'se' in Portuguese
A challenging topic in Portuguese language processing is the multifunctional and ambiguous use of the clitic pronoun se, which impacts NLP tasks such as syntactic parsing, semantic role labeling and machine translation. Aiming to give a step forward towards the automatic disambiguation of se, our study focuses on the identification of pronominal verbs, which correspond to one of the six uses of...
متن کاملDeveloping a Semantic Similarity Judgment Test for Persian Action Verbs and Non-action Nouns in Patients With Brain Injury and Determining its Content Validity
Objective: Brain trauma evidences suggest that the two grammatical categories of noun and verb are processed in different regions of the brain due to differences in the complexity of grammatical and semantic information processing. Studies have shown that the verbs belonging to different semantic categories lead to neural activity in different areas of the brain, and action verb processing is r...
متن کاملTowards Automatic Translation of Support Verbs Constructions: the Case of Polish robić/zrobić and Swedish göra
Support verb constructions range from idiosyncratic to predictable. Lexical functions provide a solution to translation of idiosyncratic constructions only. Our corpus research aims to contribute to automatic translation of support verb constructions where the verb selects certain semantic groups of collocates, and where novel collocations can be expected. We investigate samples of support verb...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/cmp-lg/9510007 شماره
صفحات -
تاریخ انتشار 1995